Semiautomated pelvic lymph node treatment response evaluation for patients with advanced prostate cancer: based on MET-RADS-P guidelines

Background The evaluation of treatment response according to METastasis Reporting and Data System for Prostate Cancer (MET-RADS-P) criteria is an important but time-consuming task for patients with advanced prostate cancer (APC). A deep learning-based algorithm has the potential to assist with this assessment. Objective To develop and evaluate a deep learning-based algorithm for semiautomated treatment response assessment of pelvic lymph nodes. Methods A total of 162 patients who had undergone at least two scans for follow-up assessment after APC metastasis treatment were enrolled. A previously reported deep learning model was used to perform automated segmentation of pelvic lymph nodes. The performance of the deep learning algorithm was evaluated using the Dice similarity coefficient (DSC) and volumetric similarity (VS). The consistency of the short diameter measurement with the radiologist was evaluated using Bland–Altman plotting. Based on the segmentation of lymph nodes, the treatment response was assessed automatically with a rule-based program according to the MET-RADS-P criteria. Kappa statistics were used to assess the accuracy and consistency of the treatment response assessment by the deep learning model and two radiologists [attending radiologist (R1) and fellow radiologist (R2)]. Results The mean DSC and VS of the pelvic lymph node segmentation were 0.82 ± 0.09 and 0.88 ± 0.12, respectively. Bland–Altman plotting showed that most of the lymph node measurements were within the upper and lower limits of agreement (LOA). The accuracies of automated segmentation-based assessment were 0.92 (95% CI: 0.85–0.96), 0.91 (95% CI: 0.86–0.95) and 75% (95% CI: 0.46–0.92) for target lesions, nontarget lesions and nonpathological lesions, respectively. The consistency of treatment response assessment based on automated segmentation and manual segmentation was excellent for target lesions [K value: 0.92 (0.86–0.98)], good for nontarget lesions [0.82 (0.74–0.90)] and moderate for nonpathological lesions [0.71 (0.50–0.92)]. Conclusion The deep learning-based semiautomated algorithm showed high accuracy for the treatment response assessment of pelvic lymph nodes and demonstrated comparable performance with radiologists.


Background
Advanced prostate cancer (APC) is characterized by the recurrence of prostate cancer after definitive treatment or by metastases without prior therapy [1]. Several therapeutic approaches have been approved for patients with APC. Aside from the androgen deprivation and docetaxel treatment, new agents with varying mechanisms of action have shown survival benefits in this population [2,3]. While the responses of patients with APC to these agents are various and treatment may cause side effects, they may result in the desired outcomes for patients. Therefore, early treatment response assessment for patients with APC allows clinicians to put a timely stop to unbeneficial treatment.
Imagery depicting metastatic state plays a key role in patient management [4,5]. There is a growing body of research demonstrating how whole-body magnetic resonance imaging can be used to diagnose and evaluate APC tumors and determine the efficacy of treatment [6,7]. The METastasis Reporting and Data System for Prostate Cancer (MET-RADS-P) guidelines aim to reduce variability in the acquisition, interpretation, and reporting of metastatic cancer by promoting standardization of practices [8]. As recommended by the Prostate Cancer Clinical Trials Working Group (PCWG), MET-RADS-P allows the subclassification of patients based on their metastatic spread pattern (bone, nodal, visceral, or local) [5].
Diffusion-weighted imaging (DWI) has been shown to successfully reflect tumor response and discriminate between future responders and nonresponders, which could be valuable in adapting future management [9]. Manual segmentation and measurement of DWI lesions based on MET-RADS-P require a high level of expertise, are time-consuming, and are subject to operator error [10,11]. Deep learning technologies have extended this quantitative approach with promising preliminary results in the assessment of tumor response in the liver [12,13]. In this study, we hypothesized that the deep learning model could also be trained to estimate the treatment response of APC according to MET-RADS-P guidelines. This study aimed to investigate the feasibility of deep learning-based treatment response evaluation of patients with APC, and for proof-of-concept, we focused on the assessment in the pelvic lymph nodes.

Patient enrollment
This study was approved by the local institutional review board, and the requirement for informed consent was waived due to its retrospective design. Two hundred and fifty-nine patients with histologically confirmed prostate cancer who underwent initial/curative treatment of metastases at our institution were included in this study between Jan 2017 and Jan 2022. Pelvic MRI scans were performed before and after at least one course of treatment (baseline and posttreatment).
According to the MET-RADS-P criteria, lymph nodes with a short diameter < 10 mm were considered nonpathological; therefore, only patients with lymph nodes ≥ 10 mm at baseline MRI should be included in the protocols. Hence, 23 of the 259 patients with APC were excluded because of the short diameter of all the lesions < 10 mm. In addition, the time interval between baseline pelvic MRI and treatment initiation was suggested to be within 4 weeks; therefore, 45 patients were excluded due to an interval of more than 4 weeks. Twelve patients were excluded because of the unqualified scanning range on baseline and follow-up MRI. Fifteen patients were excluded for inadequate image quality. Finally, 162 patients who had undergone at least two scans for follow-up assessment after APC metastasis treatment were analyzed (Fig. 1). Clinical and radiological features of the enrolled patients were acquired from the electronic information system, including age, prostate-specific antigen (PSA) level, PI-RADS v2.1 scores and TNM staging.

MRI acquisition
Three 3.0 T scanners were used (Achieva, Philips Healthcare; Discovery MR750, GE Healthcare; Intera, Philips Healthcare) to perform pelvic MRI scans. The pelvic MRI protocol performed in our institution included T2-weighted imaging (T2WI), T1WI, DWI with apparent diffusion coefficient (ADC) maps and dynamic gadolinium-DTPA (Gd-DTPA)-enhanced (DCE) sequences. The detailed scanning parameters of DWI are listed in Table 1.

Pelvic lymph nodes segmentation
A previously trained 3D U-Net segmentation model developed by the same authors in this study based on deep learning was used to automatically segment the visible pelvic lymph nodes on DWI images [14]. The training data used for the model development were different from the data included here. All visible lymph nodes included target lesions (short diameter ≥ 15 mm), nontarget lesions (10 mm ≤ short diameter < 15 mm) and nonpathological lesions (short diameter < 10 mm). Manual corrections of the automatically segmented lymph nodes made by a radiologist expert (with more than 20 years of reading experience) were considered the reference standard for segmentation evaluation.

Treatment response assessment
Based on the MET-RADS-P criteria, treatment response assessments of lymph nodes were conducted [15], including complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD).
The radiologists who corrected the lymph nodes manually provided the reference standard for treatment response assessment. An algorithm for semiautomatic response assessment was developed using the MET-RADS-P criteria by automatically calculating the diameters of the lymph nodes first and then assessing the treatment response by a rule-based program. More  details about the algorithm development of pelvic lymph nodes were shown in our previous study [14].
In addition, an attending radiology radiologist (R1) and a fellow radiology radiologist (R2), with 8 and 4 years of pelvic imaging experience, performed the treatment response assessments on all patients by primary review of the MRI images. The two radiologists compared baseline scans before treatment and subsequent scans after treatment for every patient. The definition and evaluation rules are shown in Fig. 2.

Statistical analysis
The "median (interquartile range)" values are used for the description of continuous variables, and descriptive statistics of the categorical data are presented with "n (%)". The segmentation results are quantitatively evaluated by the overlap-based metric [Dice similarity coefficient (DSC)] and the volume-based metric [volumetric similarity (VS)] [16]. The independent t-test was applied to determine the difference in the evaluation metrics between the subgroups. We used the Kappa statistic to evaluate the consistency of treatment response. A P value less than 0.05 was treated as significant. Statistical analysis was performed with MedCalc (version 14.8; MedCalc Software, Ostend, Belgium).

Study population
In this study, 162 eligible APC patients with metastases were included. The baseline characteristics of the enrolled patients are shown in Table 2

Assessment of automated lymph node segmentation
One hundred and sixty-two APC patients with 162 baseline pelvic MRI scans and 260 posttreatment MRI scans were used to perform automated lymph node segmentation. As shown in Table 3, the mean DSC and VS are 0.82 ± 0.09 and 0.88 ± 0.12, respectively. In the subgroup analyses, the DSC and VS values of the target lesions and nontarget lesions showed no significant difference (DSC: 0.85 vs. 0.82, P > 0.05; VS: 0.88 vs. 0.86, P > 0.05) but were significantly higher than those of nonpathological lesions (all P values > 0.05). The subgroups of baseline and posttreatment MRI scans showed no significant difference (all P values > 0.05). The explementary segmentation of lymph nodes is shown in Fig. 3.

Quantitative measurement of the lymph node segmentation
The mean short diameters of the automatically segmented and manually segmented target lesions were 23.53 mm (interquartile range, 17.61-26.55 mm) and 27.94 mm (interquartile range, 15.93-26.77 mm), respectively (P = 0.231). The mean short diameters of automatically segmented and manually segmented nontarget lesions were 11.91 mm (interquartile range, 10.85-13.14 mm) and 12.33 mm (interquartile range, 11.07-13.59 mm), respectively (P = 0.082). The agreement between the automatically segmented and manually segmented target lesions and nontarget lesions in terms of short diameter is shown in Fig. 4. The Bland-Altman analysis showed good consistency between the automated segmentation and manual segmentation, and most values were within the upper and lower limits of agreement (LOA).

Accuracy of the treatment response assessment
In this population, 75 APC patients with 112 pairs of pelvic MRI performed the target lesion evaluation; 129 APC patients with 209 pairs of pelvic MRI performed the nontarget lesion evaluation, and 162 APC patients with 260 pairs of pelvic MRI performed the nonpathological lesion evaluation. As shown in Fig. 5, the accuracies of the automated segmentation-based response assessment were 0.92 (95% CI: 0.85-0.96), 0.91 (95% CI: 0.86-0.95) and 75% (95% CI: 0.46-0.92) for target lesions, nontarget lesions and nonpathological lesions, respectively.

Consistency of the treatment response assessment
As shown in Table 4    metastases, lymph node metastases and organ metastases. In this study, we established a semiautomatic pelvic lymph node treatment response evaluation process for patients with APC through lymph node segmentation based on deep learning. Our results showed that the accuracies of automated segmentation-based response assessment were high for all the target lesions, nontarget lesions and nonpathological lesions according to MET-RADS-P criteria and achieved good consistency with the attending radiologist and fellow radiologist. Based on the morphology and signal characteristics of all acquired images, the MET-RADS-P system mapped unequivocal diseases to 14 predefined body regions [8,15]. Analysis of lymph node metastases in the pelvis is crucial for clinical practice and drug studies in patients with APC, which is the most common metastatic site [17]. A lymph node's size is highly correlated with survival time, a measurement that radiologists and clinicians perform to monitor disease progression or assess therapeutic options, due to the fact that many malignancies can enlarge lymph nodes [18]. According to the Response Evaluation Criteria in Solid Tumors 1.1 (RECIST 1.1) Guidelines, lymph nodes with a short-axis diameter of at least 10 mm are considered to be enlarged lymph nodes and are clinically significant [19]. The size standard of pathological lymph nodes defined by MET-RADS-P based on MRI was similar to RECIST 1.1, while MET-RADS-P provides a more complete assessment of nodal metastases response including the nontarget nodes and nonpathologic nodes, which was usually qualitatively assessed by RECIST 1.1 criteria.

MET-RADS-P is a guideline for the treatment response evaluation of systemic metastases of patients with APC, which involves the evaluation of primary focus, bone
According to the MET-RADS-P criteria, the core whole body MRI protocol designed for bone and lymph node metastasis detection included T1WI (GRE Dixon technique) and axial DWI [8]. DWI is a well-recognized and used sequence for pelvic lymph node imaging, that is able to offer qualitative and quantitative assessments for disease characterizations [14,20]. Therefore, in this study, we performed the treatment response assessment only on DWI images.
In this study, the established semiautomatic pelvic lymph node treatment response evaluation process according to MET-RADS-P criteria included two parts. First, a previously established pelvic lymph node segmentation model was used to perform the automatic segmentation of lymph nodes. The model achieved good segmentation performance here, which is similar to the segmentation results reported in previous literature (the DSC and VS values for all visible lymph nodes were 0.76 ± 0.15 and 0.82 ± 0.14, respectively) [14], especially the target lesions, further highlighting its potential usefulness.
Second, based on the quantitative measurements obtained from the automated segmentation, we can directly evaluate the treatment response according to MET-RADS-P criteria, which can be more practical in clinical settings. A clinical radiology report provides a qualitative narrative, but does not provide standardized, quantitative information about the patient's progress or response to treatment [21]. Natural language processing and deep learning models have been employed in previous studies to estimate responses from clinical text [22,23]. These approaches can be feasible for quantitative assessment related to MET-RADS-P criteria but can be indirect.
Our proposed semiautomated algorithm achieved high Kappa values in terms of treatment response assessment with attending and fellow radiologists when measuring the same set of target and nontarget lesions. The consistency of nonpathological lesions was lower, which may be due to the relatively poor segmentation performance. Tang et al. [24] proposed a deep learning-based method for semiautomated RECISTS measurement and assessed using a mean difference between the deep learning algorithm and manual measurement in the unit of pixels. Scores using pixel difference, however, may not be reliable, as scores are largely determined by data composition. In this study, we used Bland-Altman plotting based on percent measurement difference to address the issue as suggested by Woo et al. [25]. As demonstrated, the Bland-Altman analysis indicated good consistency between the automated segmentation and manual segmentation, and most values were within the upper and lower LOA. There are some limitations that need to be addressed. First, in this study, the deep learning-based treatment response assessment was only focused on the pelvic lymph node, and other regions of the body according to the MET-RADS-P guideline need to be investigated in the future. Second, we acknowledge that there remain opportunities for further model refinement, including the achievement of lymph node registration between baseline and posttreatment images, thus realizing fully automated lymph node treatment response evaluation. Finally, our results demonstrated that the semiautomated treatment response assessment can be achieved on the DWI sequence, but the values of other sequences (e.g. T1WI, DCE or T2WI) on response assessment also need to be investigated in further studies.

Conclusion
In conclusion, we have developed a semiautomated deep learning-based model to estimate response assessments of pelvic lymph nodes in patients with APC. The accuracy of response assessments based on the automatically segmented lymph nodes showed close similarity to the manually segmented lymph nodes and yielded output comparable to the radiologists. These initial results provide a promising way to achieve a fully automated treatment response assessment algorithm according to MET-RADS-P criteria.